AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Vision Transformer Architecture

# Vision Transformer Architecture

Sapiens Seg 0.6b
Sapiens is a family of Vision Transformer models pre-trained on 300 million 1024x1024 resolution human images, focusing on human-centric vision tasks.
Image Segmentation English
S
facebook
16
0
Best Model ViTB16 GPT2
A cross-modal model based on Vision Transformer (ViT) and GPT-2, capable of generating natural language descriptions for input images
Image-to-Text Transformers Supports Multiple Languages
B
evlinzxxx
15
0
Dog Breeds Multiclass Image Classification With Vit
MIT
A dog breed classification model fine-tuned using Google's Vision Transformer architecture, supporting image recognition of 120 dog breeds
Image Classification Transformers
D
wesleyacheng
584
4
Big Cat Classifier
An image classifier based on Vision Transformers that accurately identifies five species of big cats.
Image Classification Transformers
B
smaranjitghose
93
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase